计算机与现代化 ›› 2012, Vol. 1 ›› Issue (9): 232-234,.doi: 10.3969/j.issn.1006-2475.2012.09.061

• 应用与开发 • 上一篇    下一篇

多路数据源的蛋白质搜索信息整合方法

陈雅琦,朱斐   

  1. 苏州大学计算机科学与技术学院,江苏苏州215006
  • 收稿日期:2012-08-17 修回日期:1900-01-01 出版日期:2012-09-21 发布日期:2012-09-21

Integrated Method of Information Search for Protein from Different Resource of Database

CHEN Ya-qi, ZHU Fei   

  1. School of Computer Science and Technology, Soochow University, Suzhou 215006, China
  • Received:2012-08-17 Revised:1900-01-01 Online:2012-09-21 Published:2012-09-21

摘要: 蛋白质的研究在如今的科学研究中的地位越来越重要,尤其是当今科技发展速度之迅速,人们所了解的蛋白质信息越来越多,也更加繁杂。而如何在这纷繁的信息之海中高效与精确地搜寻到所需要的蛋白质信息,是需要研究的问题。本文以NCBI和Binding DB为例,设计一种蛋白质搜索信息整合方法,通过从搜索到的信息条目中提取关键字组成二元组并进行分组,在每个分组里同样进行细化关键字的提取和分组,以此循环,并且由二元组衍生到N元组,从而达到去除冗余信息和信息排序整合的目的。

关键词: 蛋白质搜索, 关键字, 二元组, 信息整合, NCBI, Binding DB

Abstract: With the rapid development of biological science and technology, people know more and more about the basic substance, protein. However, the more information, the more difficulties people meet in searching. To get it efficient and precise message of protein, this paper designs a method to extract the information through NCBI and Binding DB for example. The method is about obtaining efficient information with no redundancy. It extracts keywords to form bigram from the information entry which is searched, and then divides it into groups. In each group, the detailed keyword extraction and grouping information is done, and cycles the processes till N-gram is generated, so that it achieves the purpose of getting rid of redundancy and integrates the ordering of information.

Key words: information search for protein, keywords, bigram, information integration, NCBI, Binding DB